Data Smashing

机译：数据粉碎

页面导航

摘要
著录项
相似文献
相关主题

摘要

Investigation of the underlying physics or biology from empirical datarequires a quantifiable notion of similarity - when do two observed data setsindicate nearly identical generating processes, and when they do not. Thediscriminating characteristics to look for in data is often determined byheuristics designed by experts, $e.g.$, distinct shapes of "folded" lightcurvesmay be used as "features" to classify variable stars, while determination ofpathological brain states might require a Fourier analysis of brainwaveactivity. Finding good features is non-trivial. Here, we propose a universalsolution to this problem: we delineate a principle for quantifying similaritybetween sources of arbitrary data streams, without a priori knowledge, featuresor training. We uncover an algebraic structure on a space of symbolic modelsfor quantized data, and show that such stochastic generators may be added anduniquely inverted; and that a model and its inverse always sum to the generatorof flat white noise. Therefore, every data stream has an anti-stream: datagenerated by the inverse model. Similarity between two streams, then, is thedegree to which one, when summed to the other's anti-stream, mutuallyannihilates all statistical structure to noise. We call this data smashing. Wepresent diverse applications, including disambiguation of brainwaves pertainingto epileptic seizures, detection of anomalous cardiac rhythms, andclassification of astronomical objects from raw photometry. In our examples,the data smashing principle, without access to any domain knowledge, meets orexceeds the performance of specialized algorithms tuned by domain experts.

机译：根据经验数据对基础物理学或生物学进行研究需要一种可量化的相似性概念-两个观察到的数据集何时表明几乎相同的生成过程，而当它们不同时。在数据中寻找的区别特征通常由专家设计的启发法确定，例如，“折叠”光曲线的不同形状可以用作对可变恒星进行分类的“特征”，而病理性脑状态的确定可能需要对脑波活动性进行傅立叶分析。找到好的功能并非易事。在这里，我们提出了一个解决这个问题的通用方案：我们描述了一种无需先验知识，特征或训练即可量化任意数据流源之间相似度的原理。我们在符号化模型的空间上发现了量化数据的代数结构，并表明可以添加这种随机生成器并对其进行唯一反转。而且模型及其逆总是总会产生平坦的白噪声。因此，每个数据流都有一个反流：由逆模型生成的数据。因此，两个流之间的相似度是一个流与另一个流的总和相互抵消时将所有统计结构消除的程度。我们称这种数据粉碎。我们提出了各种各样的应用，包括消除与癫痫发作有关的脑电波的歧义，异常心律的检测以及从原始光度法对天文物体进行分类。在我们的示例中，数据粉碎原理无需访问任何领域知识，即可达到或超过由领域专家调整的专用算法的性能。

著录项

作者
Chattopadhyay, Ishanu; Lipson, Hod;
展开▼
作者单位

展开▼
年度 2014
总页数
原文格式 PDF
正文语种 {"code":"en","name":"English","id":9}
中图分类

相似文献

外文文献
中文文献
专利

1. Data smashing: uncovering lurking order in data [J] . Ishanu Chattopadhyay, Hod Upson Journal of the Royal Society Interface . 2014 ,第101期

机译：数据粉碎：发现数据中的潜伏次序
2. Badminton smashing point position and smashing move relational mechanical model research [J] . Lin Li BioTechnology: An Indian Journal . 2014 ,第5期

机译：羽毛球击球点位置与击球动作关系的力学模型研究
3. Smashing Success. Less Smashing Returns [J] . Peter Woodifield, Tom Bill Business week . 2011 ,第4223期

机译：粉碎成功。减价收益少
4. Smashing SDN "built-in" actions: Programmable data plane packet manipulation in hardware [C] . Salvatore Pontarelli, Marco Bonola, Giuseppe Bianchi IEEE Conference on Network Softwarization . 2017

机译：粉碎SDN的“内置”操作：硬件中的可编程数据平面包处理
5. Raging with Things: Performance of Smashing at Rage Rooms [D] . Lee, Dahye . 2020

机译：用事物肆虐：在愤怒房间粉碎的表现
6. Data smashing: uncovering lurking order in data [O] . Ishanu Chattopadhyay, Hod Lipson 2014

机译：数据粉碎：发现数据中的潜伏次序
7. ?Squashing peanuts and smashing pumpkins?: How noise distorts diffusion-weighted MR data [O] . Derek K. Jones, Peter J. Basser 2004

机译：？挤压花生和粉碎南瓜？：如何噪音扭曲扩散加权MR数据
8. Smashing the Stovepipe: Leveraging the GMSEC Open Architecture and Advanced IT Automation to Rapidly Prototype, Develop and Deploy Next-Generation Multi-Mission Ground Systems. [R] . Swenson, P. 2017

机译：粉碎stovepipe：利用GmsEC开放式架构和先进的IT自动化，快速构建，开发和部署下一代多任务地面系统。

Data Smashing

摘要

著录项

相似文献

相关主题

期刊订阅